ZEPPELIN-3375: Make PySparkInterpreter extends PythonInterpreter by zjffdu · Pull Request #2919 · apache/zeppelin

zjffdu · 2018-04-11T02:37:38Z

What is this PR for?

This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:

PySparkInterpreter extends PythonInterpreter
PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

What type of PR is it?

[ Improvement | Refactoring]

Todos

- Task

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-3375

How should this be tested?

CI pass

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

zjffdu · 2018-04-11T06:12:15Z

@felixcheung @Leemoonsoo Could you help review it ? Thanks

felixcheung · 2018-04-11T06:50:51Z

python/src/main/java/org/apache/zeppelin/python/IPythonInterpreter.java

+    //TODO(zjffdu) don't do hard code on py4j here
+    File py4jDestFile = new File(pythonWorkDir, "py4j-src-0.9.2.zip");
+    FileUtils.copyURLToFile(getClass().getClassLoader().getResource(
+        "python/py4j-src-0.9.2.zip"), py4jDestFile);


yeah, 2.3 is running with Py4J 0.10.6

should this detect any mismatch here? check spark version or something?

It is fine to use py4j 0.9.2 here for IPythonInterpreter, as for IPySparkInterpreter it would use the py4j of spark instead of py4j 0.9.2

I wonder why Spark doesn't ship py4j zip file version-agnostic? Filed https://issues.apache.org/jira/browse/SPARK-23965

I don't think this is a strong reason to rename or make a link for Spark's Py4J within Spark. Also, to be clear, I think It's an orthogonal issue with the current change here, if I am not mistaken.

felixcheung · 2018-04-11T06:51:57Z

python/src/main/java/org/apache/zeppelin/python/PythonCondaInterpreter.java

+      throw new IOException("Fail to run shell commands: " + StringUtils.join(commands, " "));
+    }
+    logger.info("Complete shell commands: " + StringUtils.join(commands, " "));
+    return outputGobbler.getOutput();


I thought we had some launcher wrapper for something like this?

felixcheung · 2018-04-11T06:52:49Z

zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/InterpreterGroup.java

+        try {
+          interpreter.close();
+        } catch (InterpreterException e) {
+          LOGGER.warn("Fail to close interpreter: " + interpreter.getClassName());


would the exception stack be useful?
just to LOGGER.warn( .... , e);?

felixcheung · 2018-04-11T06:54:39Z

spark/interpreter/src/main/resources/python/zeppelin_pyspark.py

-
-jsc = intp.getJavaSparkContext()
-
-if sparkVersion.isImportAllPackageUnderSparkSql():


why not keeping this?

It is only used when spark version is lower than 1.3. There's many code in zeppelin that is for specific old spark version. I don't think we need them, actually we have no test for them, no one know whether they works or note. I think it is time to zeppelin drop support for old version of spark. But it require more work, will do it in another PR for this.

felixcheung · 2018-04-11T06:55:20Z

spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java

+      try {
+        bootstrapInterpreter("python/zeppelin_pyspark.py");
+      } catch (IOException e) {
+        e.printStackTrace();


felixcheung · 2018-04-11T06:58:27Z

python/src/main/resources/python/zeppelin_python.py

      except:
-        raise Exception(traceback.format_exc())
+        if not isForCompletion:
+          exception = traceback.format_exc()


add a comment what this is looking for and what it looks like typically?

felixcheung · 2018-04-11T07:00:02Z

python/src/main/resources/python/zeppelin_python.py

+gateway = JavaGateway(client, auto_convert = True)
+intp = gateway.entry_point
+# redirect stdout/stderr to java side so that PythonInterpreter can capture the python execution result
+output = Logger()


there are problems reported with these names too common, conflict with existing variables, output, gateway, client etc, we should name these uniquely if we could - even if "temporary" since in python variables have global scope

It if fine to use them, because they are not in the same namespace of user code (they are not visible to users).

I'm not sure - this sets globally right? for example z is accessible from user code.

User code is in namespace _zcUserQueryNameSpace instead of global namespace https://github.com/zjffdu/zeppelin/blob/ZEPPELIN-3375/python/src/main/resources/python/zeppelin_python.py#L91

felixcheung · 2018-04-11T07:01:03Z

python/src/main/resources/python/zeppelin_python.py

+      return None
+    else:
+      objectDefList = execResult['objectDefList']
+      return [completion for completion in execResult['objectDefList'] if completion.startswith(methodName)]


startswith -I think a lot times partial match - not necessarily from the beginning, can be good?

Might be, there's a lot work to do for improving the code completion. This PR is large, I don't want to cover too much thing in this single PR.

zjffdu · 2018-04-16T03:30:04Z

Will merge if no more comment

### What is this PR for? This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. ### What type of PR is it? [ Improvement | Refactoring] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-3375 ### How should this be tested? * CI pass ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446)

This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes #2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446) (cherry picked from commit 595d45b)

zjffdu force-pushed the ZEPPELIN-3375 branch 2 times, most recently from 6116344 to d6481ea Compare April 11, 2018 05:34

felixcheung reviewed Apr 11, 2018

View reviewed changes

zjffdu force-pushed the ZEPPELIN-3375 branch 3 times, most recently from 91ef36d to 4d30a5e Compare April 13, 2018 09:30

ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

738c6c5

zjffdu force-pushed the ZEPPELIN-3375 branch from 4d30a5e to 738c6c5 Compare April 13, 2018 09:57

asfgit closed this in 0a97446 Apr 16, 2018


		jsc = intp.getJavaSparkContext()

		if sparkVersion.isImportAllPackageUnderSparkSql():

Conversation

zjffdu commented Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

zjffdu commented Apr 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjffdu Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjffdu commented Apr 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zjffdu commented Apr 11, 2018 •

edited

Loading

HyukjinKwon Apr 13, 2018 •

edited

Loading

felixcheung Apr 11, 2018 •

edited

Loading

zjffdu Apr 11, 2018 •

edited

Loading